A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

نویسندگان

  • Costas Busch
  • Srikanta Tirthapura
چکیده

We consider the problem of maintaining aggregates over recent elements of a massive data stream. Motivated by applications involving network data, we consider asynchronous data streams, where the observed order of data may be different from the order in which the data was generated. The set of recent elements is modeled as a sliding timestamp window of the stream, whose elements are changing continuously with time. We present the first deterministic algorithms for maintaining a small space summary of elements in a sliding timestamp window of an asynchronous data stream. The summary can return approximate answers for the following fundamental aggregates: basic count, the number of elements within the sliding window, and sum, the sum of all element values within the sliding window. For basic counting, the space taken by our summary is O(log W · log B · (log W + log B)/ ) bits, where B is an upper bound on the value of the basic count, W is an upper bound on the width of the timestamp window, and is the desired relative error. Our algorithms are based on a novel data structure called splittable histogram. Prior to this work, randomized algorithms were known for this problem, which provide weaker guarantees than those provided by our deterministic algorithms. Classification: Algorithms and Data Structures, Data Stream Processing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Design of a Sliding Window over Asynchronous Event Streams

The proliferation of sensing and monitoring applications motivates adoption of the event stream model of computation. Though sliding windows are widely used to facilitate effective event stream processing, it is greatly challenged when the event sources are distributed and asynchronous. To address this challenge, we first show that the snapshots of the asynchronous event streams within the slid...

متن کامل

DENGRIS-Stream: A Density-Grid based Clustering Algorithm for Evolving Data Streams over Sliding Window

Evolving data streams are ubiquitous. Various clustering algorithms have been developed to extract useful knowledge from evolving data streams in real time. Density-based clustering method has the ability to handle outliers and discover arbitrary shape clusters whereas grid-based clustering has high speed processing time. Sliding window is a widely used model for data stream mining due to its e...

متن کامل

Mining Recent Frequent Itemsets in Sliding Windows over Data Streams

This paper considers the problem of mining recent frequent itemsets over data streams. As the data grows without limit at a rapid rate, it is hard to track the new changes of frequent itemsets over data streams. We propose an efficient one-pass algorithm in sliding windows over data streams with an error bound guarantee. This algorithm does not need to refer to obsolete transactions when 316 C....

متن کامل

RS-Algo: an Algorithm for Improved Memory Utilization in Continuous Query System under Asynchronous Data Streams

Continuous query systems are an intuitive way for users to access data streaming data in large scale scientific applications containing hundreds if not thousands of streams. This paper focuses on optimizations for improved memory utilization under the large scale, asynchronous streams found in on-demand meteorological forecasting and other data-driven applications. Specifically, we experimental...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007